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Abstract 

The distances over which biological molecules and their complexes can function range from a 
few nanometres, in the case of folded structures, to millimetres, for example during chromosome 
organization. Describing phenomena that cover such diverse length, and also time scales, requires 
models that capture the underlying physics for the particular length scale of interest. Theoretical 
ideas, in particular, concepts from polymer physics, have guided the development of coarse-grained 
models to study folding of DNA, RNA, and proteins. More recently, such models and their variants 
have been applied to the functions of biological nanomachines. Simulations using coarse-grained 
models are now poised to address a wide range of problems in biology. 
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Minimal models that capture the essence of complex phenomena has a rich history in 
the natural sciences. In condensed matter physics insights into many phenomena have 
emerged from analytic theories of models, which use effective many body Hamiltonians that 
succinctly capture the essence of the problems [Ij. Examples include phase transitions, 
superfluidity and superconductivity. However, complex problems, such as spin glasses [2j, 
structural glasses [3^ 4J and a host of problems in biology such as protein and RNA folding 
and functions of macromolecules have resisted solutions using purely theoretical methods. 
These and other problems in material science in which a wide range of time, energy, and 
length scales are intertwined require well-designed computer simulations, which capture the 
essential features of the systems. Although the temptation to use detailed atomic simulations 
in protein folding and more complicated problems is hard to resist, such an approach has 
given us only limited insights. In contrast, since the flrst classical molecular dynamics 
simulation that reported phase transition in hard-sphere systems [5j , it has been clear that 
coarse-grained (CG) models are often the only way to describe phenomena that involve an 
interplay of multiple energy and time scales. Nowhere is the need for CG models greater 
than in biology in which self-assembly of macromolecules and their functions, which involve 
multiple partners, occur on time and length scales that cover many orders of magnitude. In 
the context of protein and RNA folding, simulations using CG models, guided by theoretical 
concepts |6I-I10J. have unearthed the principles of self-assembly. More recently models, which 
were introduced to describe folding of isolated proteins [HHTS] and RNA [16j, have also been 
adopted and extended in novel ways to predict functions of large complexes such as ribosomes 
[17], molecular chaperones [18], enzyme catalysis fT9], protein insertion into membranes [20] 
and a number of motors [2T] - [28] . These developments have resulted in a quiet revolution, 
which has provided molecular insights into a variety of biological processes. 

In the last two decades, fundamental breakthroughs into structural organization and 
dynamics of proteins, RNA, and DNA have been achieved using theoretical concepts from 
polymer physics [29] and CG simulations. Here, we describe how simulations using a variety 
of CG models have been successful in describing dynamical processes in biology spanning 
a wide range of length scales. These achievements have been further extended to probe 
folding under cellular conditions [30l - [32] , and more recently to describe functional dynamics 
of biological nanomachines [HI [22] - [24l [26] . The use of CG models and simple theoretical 
ideas have also found fruitful applications in many other areas in biology such as gene 
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networks, systems biology, and analysis of complex metabolic pathways. 
Length scales determine extent of coarse-graining. 

Description of reality using models requires a level of abstraction, which depends on the 
phenomenon of interest. For example, near a critical point, exponents that describe the 
vanishing of order parameter or divergence of correlation length are universal, depending 
only on the dimensionality (rf), and are impervious to atomic details. These findings, which 
are rooted in the concepts of universality and renormalization group [33j , are also applicable 
to the properties of polymers [29]. For example, the size of a long homopolymer and its 
distribution of end-to-end distance depend only on the solvent quality, the degree of poly- 
merization, and rf, but not on the details of monomer structure [29j. However, on length 
scales that are on the order of a few nm one has to contend with chemical properties of the 
monomer. 

In the absence of rigorous theoretical underpinnings, intuitive arguments and phe- 
nomenology come into play in modeling complex biological processes (Box 1). Here, also 
the level of description depends on length scales. In nucleic acids, at a short length scales 
(/ < 5 A) detailed chemical environment determines the basic forces (hydrogen bonds and 
dispersion forces) between two nucleotides. On the scale / ^ (1 — 3) nm interactions between 
two bases, base stacks and grooves of the nucleic acids become relevant. Understanding how 
RNA folds (/ ^ (1 — 3) nm) requires energy functions that provide at least a CG description 
of nucleotides, and interactions between them in the native state and excitations around 
the folded structure. On the persistence length scale Ip ^ 150 hp ^ 50 nm [34j and beyond 
it suffices to treat dsDNA as a stiff elastic filament without explicitly capturing the base- 
pairs. If / ^ C^(l) dsDNA behaves like a self-avoiding polymer ^35j. On the scale of 
chromosomes (/ ^ mm) a much coarser description suffices. Thus, models for DNA, RNA, 
and proteins vary because the scale of structural organization changes from nearly mm in 
chromatin to several nm in the folded states of RNA and proteins. 
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Polymer models for dsDNA and chromosome structure. 

Length, L, of double stranded DNA (dsDNA) exceeds a few /am with persistence length, 
Ip ^ 50nm. On these scales global properties of dsDNA, such as the end-to-end distance 
and the dependence of Ip on salt concentration, are not greatly affected by fluctuations of 
individual base pairs. Consequently, dsDNA can be treated as a fluctuating elastic material, 
for which the Worm-like Chain (WLC) is a suitable polymer model. On much longer scales 
(L ^ 1mm), which is relevant to chromosome, the genomic material can be described as 
a flexible polymer. Using these scale-dependent models a number of predictions for DNA 
organization and dynamics can be made. 

Looping dynamics: Loop formation in biopolymers is an elementary process in the 
self-assembly of DNA, RNA and proteins. However, understanding cyclization kinetics is 
complicated because multiple length scales and internal chain modes are intertwined in 
bringing distant parts of DNA into proximity. For a short chain, the cyclization time, Tc 
scales as L^/^ while Tc ^ when L increases jSSl |37j . The problem of cyclization becomes 
more challenging in the looping dynamics of dsDNA, an elementary process that is relevant 
in controlling gene expression and DNA condensation. In the CG model a single-pitch 
of a double helix, formed by 10.5 base-pairs, represents one interaction center (Figjl^). 
Thus, Ip encompasses (14 — 15) CG interaction centers {Ip ^ 150 bp). The parameters 
for bond and bending potentials along the chain, consisting of multiple CG centers, are 
selected to reproduce the persistence length of dsDN A [381 EH] , allowing us to study various 
dynamics of dsDNA, stretching, looping, or supercoiling from the perspective of polymer 
physics. The ease of loop formation and the associated kinetics is characterized by L/lp. 
For L/lp ^ 0(1), energy required to bend dsDNA makes the cyclization diflicult for short 
chains. In contrast, when L/lp ^ 1 the cyclization between two ends gets harder because 
of loss of chain entropy. Theory and simulations using CG model showed that Tc is the 
shortest when L/lp ^ 2 — 3 [40H42] (Figjl^). Interestingly, in looping of dsDNA responsible 
for gene regulation in prokaryotes L ^ 100 bp {L/lp ^ 0.7). For such a dsDNA with 
L ^ 100 bp sequence eflFects are also relevant j43l - [45] . 
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Stretching dsDNA: Fluctuations of dsDNA on scales comparable to Ip can be described 
using WLC model, which parameterizes dsDNA as a polymer which resists bending on 
scale ^ Ip. Smith et. al measured the response of a 97 kbp dsDNA (countour length 
L ^ 33.0 /xm) from A-phage to a stretching force, / [SU [46j (Fig{T]3). In the absence of 
/, A-DNA conformations are determined by thermal fluctuations, whereas loss in chain 
entropy must be overcome to stretch dsDNA / 7^ 0. The free energy of stretching of a 
semiflexible chain under tension is equivalent to a quantum mechanical problem of a dipolar 
rotor with moment of inertia Ip in an electric field /. An extrapolation formula obtained 
by numerically solving the quantum mechanical problem that accurately describes the 
measured force as a function of extension (Fig{T]3). Fits to experimental data yield L of 
A-DNA (32.80 ± 0.10) fim and Ip ^ (53.4 ± 2.3) nm, thus confirming most directly that 
dsDNA is a semifiexible chain. 

Confined polymers and bacterial chromosome segregation: Replication and 
passage of genetic information to daughter cells are major events in cell reproduction. These 
complex events are remarkably accurate even in simple organisms. Although chromosome 
segregation is likely to be complex and well orchestrated, it has recently been proposed that 
confinement-induced entropic forces due to restrictions in cellular space is suflicient to drive 
chromosome segregation in bacteria |471 [48] (Figjl]^). This proposal was formulated using 
molecular simulations of tightly confined self- avoiding polymers chains in cylindrical space, 
which show that the chains segregate and become spatially organized reminiscent of that 
observed in bacteria. In such highly confined spaces polymer conformations are determined 
by ^, the size of a renormalized structural unit, the Flory radius Rp in the absence of 
confinement, and the length (P) and diameter D of the cylinder. In E. Coli. the values are 
= 87 nm, Rp = 3.3 /xm, D and P are 0.24 /am and 1.3 /xm, respectively. Armed with 
the results for confined polymers, a concentric shell model for bacterial chromosome was 
proposed [471 148] in which the nucleoid was modeled as an inner and outer cylinder. The 
unreplicated "mother" strand, a self- avoiding chain, is restricted to the inner compartment 
whereas the "daughter" chain (obtained in simulations by adding monomers at a set time in 
the Monte Carlo simulations) are free to explore the entire nucleoid volume. The results of 
the simulations show that the newly added (or replicated) chain segregates to the periphery 
of the nucleoid, driven by gain in entropy, and become spatially organized as they are 
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synthesized (FigjTp). CG modeling combined with polymer theory lead to the discovery 
that entropic forces alone are sufficient to drive chromosome segregation in bacteria, with 
proteins perhaps playing a secondary role in poising the state of the chromosome for 
enabling the entropy-driven mechanism. 

Chromosome Folding: In eukaryotic cells chromosomes fold into globules that spatially 
occupy well-defined volumes known as chromosome territories [49]. In this process widely 
separated gene-rich regions are brought into close proximity. Knowledge of the spatial 
arrangement of chromosomes is important in describing gene activity and the state of the cell. 
Polymer physics concepts have been used to describe the structures of folded chromosome 
using constraints derived from experiments. These calculations have provided considerable 
insights into their compart mentalizat ion in the nucleus [50j . A number of models, such as the 
random walk model, and models that connect mega-based size domains by chromatin loops 
have been used to describe higher structures of chromatin. The experimental resolution is 
roughly 1Mb 340//m), and consequently coarse-graining in this context must be on length 
scales on the order of a fim. Recently folding principles for human genome were proposed 
using data for long-range contacts between distinct loci as constraints [51j. Experiments 
showed that contact probability, /(s), between loci in a chromosome, which is separated by 
genomic distance s (measured in units of bp) exhibits a power law decay in the range ^ 
500 kb to ^ 7 Mb. The observed dependence I{s) ~ can be rationalized using polymer 
models (Fig{l]l) introduced a number of years ago in describing collapse of homopolymers 
[52] . If chromosome folds up into an equilibrium globule (polymer in a poor solvent) then 
I{s) ^ which cannot account for the experimental observations. An alternate model 

suggests that interface DNA can organize itself into a fractal globule, which is compact and 
not entangled as an equilibrium globule would be. Monte Carlo simulations of a polymer 
with 4000 beads (1 bead = 1200 bps ^ 0.4/xm) were used to generate conformations of fractal 
and equilibrium globules. The power law decay of /(s), with exponent — 1, is consistent 
with measurements (Fig{l]l). More importantly, the unknotted fractal globules loci that are 
close in genomic sequence are also in proximity in three dimensional spatial arrangement, 
which clearly is relevant for gene activity. 
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RNA Folding 



Since the discovery that RNA can serve as enzymes there has been great impetus to 
describe their folding in quantitative terms. RNA folding landscape is rugged because of 
interplay of several competing factors. First, phosphate groups are negatively charged, 
which implies that polyelectrolyte effects oppose folding. Valence, size and shape of 
counterions, necessary to induce compaction and folding [53j, can dramatically alter 
the thermodynamics and kinetics of RNA folding. Second, the nucleotides purine and 
pyrimidine bases have different sizes but are chemically similar. Third, only ^ 46 % of bases 
form canonical Watson-Crick base pairs while the remaining nucleotides are in non-pairing 
regions [54j. Fourth, the lack of chemical diversity in the bases results in RNA easily 
adopting alternate misfolded conformations, which means that the stability gap between 
the folded and misfolded structures is not too large. Thus, the homopolymer nature of the 
RNA monomers, the critical role of counterions in shaping the folding landscape, and the 
presence of low-energy excitations around the folded state make RNA folding a challenging 
problem [6]. 

Polyelectrolyte (PE) effects: To fold, RNA must overcome the large electrostatic 
repulsion between the negatively charged phosphate groups. PE based theory shows that 
multivalent cations {Z > 1) are more efficient in neutralizing the backbone charges than 
monovalent ions - a prediction that is borne out in experiments. The midpoint of the 
folding transition C^, the ion-concentration at which the folded and unfolded states are 
equal, for Tetrahymena ribozyme is ^ 3 x 10^ fold greater in Na^ than in cobalt-hexamine 
{Z = 3)\ The nature of compact structures depends on Z with the radius of gyration scaling 
as Rg oc l/Z'^^ which implies compact intermediates have larger free energy as Z increases. 
Thus, folding rates should decrease as Z increases, which also accords well with experiments 
[55]. Polyelectrolyte theory also shows that counterion charge density ( = Ze/V should 
control RNA stability. As ( increases, RNA stability should increase - a prediction that 
was validated using a combination of PE-based simulations and experiments. The changes 
in stability of in Tetrahymena ribozyme in various Group II metal ions (Mg^+, Ca^+, Ba^+, 
and Sr^+) showed a remarkable linear variation with ( [56j. The extent of stability is largest 
for ions with largest ( (smallest V). Brownian dynamics simulations showed that this 
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effect could be captured solely by non-specific ion-RNA interactions [56]. These findings 
and similar variations of stability in different sized diamines show that (i) the bulk of the 
stability arises from non-specific association of ions with RNA, and (ii) stability can be 
greatly altered by valence, shape, and size of the counterions. 

Structures of RNA intermediates: The complete characterization of counterion- 
mediated RNA folding requires structural description of the unfolded ([/), intermediate (/), 
and the folded states. Structures of the folded states can be obtained using crystallography 
or NMR. However, it is difficult to characterize the ensemble of structures populated at 
low (U) and moderate (/) ion concentrations (C). To obtain the ensemble of / structures 
from time-resolved SAXS data, a CG model for Tetrahymena group I was constructed by 
representing (5-6) nucleotide pairs by a single sphere (Figj2^) [57j. The salient findings are: 
(i) At times prior to global collapse the domains of the ribozymes are extended because 
PE effects dominate, (ii) On time scales that are much less than the overall folding time 
there is a drastic reduction in the size of RNA. The folding intermediates are fluid-like and 
must be a mixture of species that contain specifically collapsed structures (large degree of 
native-like order) and non-specifically collapsed conformations (low degree of native-like 
order). 

Complexity of hairpin formation: When viewed on length scales that span several 
bps folding of a small RNA (or DNA) hairpin is remarkably simple. However, when probed 
on short times {ns — jis range) the formation of a small hairpin involving turn formation 
and base-stacking is remarkably complex. Recent experiments show that the kinetics of 
hairpin formation in RNA (or ssDNA) deviates from the classical two-state kinetics and is 
best described as a multi-step process [58j . Additional facets of hairpin formation have been 
revealed in single molecule experiments that use mechanical force (/). These experiments 
prompted simulations that vary both T and /. The equilibrium phase diagram showed two 
basins of attraction (folded and unfolded) at the locus of critical points (r^,/m)- At 
and fm the probability of being unfolded and folded is the same. The free energy surface 
obtained from simulations explained the sharp bimodal transition between the folded and 
unfolded state when the RNA hairpin is subject to / [HI [59j. Thus, from thermodynamic 
considerations, hairpin formation can be described as a two-state system. 
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Upon temperature quench hairpin forms by multiple steps [59] as observed in the recent 
kinetic experiments. Folding pathways between T-quench and /-quench refolding are differ- 
ent (see Figj2j3). The initial conformations generated by forced-unfolding are fully extended. 
They are structurally homogeneous. The first event in folding upon /-quench is loop forma- 
tion, which is a slow slow nucleation process(see Figj2j3). Zipping of the remaining base pairs 
leads to rapid hairpin formation. Refolding upon T-quench commences from a structurally 
broad ensemble of unfolded conformations. Therefore, nucleation can originate from many 
regions in the molecule (see Figj2j3). The simulations showed that the complexity of the 
folding landscape observed in ribozyme experiments was already refiected in the formation 
of simple RNA hairpin [16l l60] just as /3-hairpin formation captures much of the complexity 
of protein folding \QT]. 

Protein Folding 

The impetus to understand the mechanisms of protein folding comes from a number of 
different sources. First, there is increasing need to produce models that can predict folding 
thermodynamics and kinetics at conditions used in experiments. Second, it is urgent to 
describe the biophysical basis of misfolding and the link to neurodegenerative diseases. 
Third, as we move towards a system level description of cellular processes it is important 
to develop theoretical models for describing folding in crowded solutions as well as folding 
of proteins as they are synthesized by the ribosome. 

Molecular Transfer Mode (MTM): The validity of models can only be assessed by 
comparing simulation results (obtained under conditions used in experiments) to experi- 
ments. Majority of computational studies use temperature to trigger folding and unfolding 
whereas a substantial number of experiments use denaturants for the same purpose, thus 
making it difficult to validate the models. This difficulty has been overcome with the 
introduction of a phenomenological MTM [621 [63] , which combines simulations performed in 
condition A (for example fixed temperature, Ti and zero denaturant concentration) and the 
sampled conformations are assigned appropriate Boltzmann weight such that the behavior 
in solution condition B (T and non-zero denaturant concentration for example) can be 
accurately predicted without running additional simulations. The MTM theory shows that 
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this procedure is exact provided the conformations of the protein are exhaustively sampled 
in condition A, and is only limited by the accuracies of the force fields. Applications of 
MTM requires the free energy cost of transferring a given protein conformation from A^B, 
which were taken from experimentally measured transfer free energies for the peptide 
backbone and each amino acid. The MTM simulations for protein L (an a/ (5 protein) and 
the nearly all /3-sheet cold shock protein quantitatively reproduced measured values of the 
dependence of the population of the folded state as a function of denaturants (Figjsk). 




Surprisingly, MTM-based simulations also accurately predicted denaturant-dependent 
measurements in single molecule experiments. 

Mechanical force in protein folding: A number of single molecule experiments, 
which use mechanical force (/) in various modes (force ramp, force quench, and constant 
force) to initiate folding from arbitrary regions in the energy landscape, have given a new 
perspective on protein (and RNA) folding jSU [65]. These experiments, which monitor 
time-dependent changes in the extension, x(t), of the protein of interest showed that 
folding occurs in multiple stages upon force quench. The power of these experiments 
are fully realized only by combining them with theory [661469] and simulations. Such 
an approach was used to construct the folding landscape of the nearly 250-residue green 
fluorescent protein (GFP), which has a barrel-shaped structure consisting of 11 ;5-strands 
with one a-helix in the C-terminus. Using simulations with self-organized polymer (SOP) 
representation [70j of GFP at the loading rate used in experiments a rich and complex 
folding landscape was predicted (FigJsjD). Unfolding of the native (A^) began with rupture 
of the a-helix leading to [GFPAa] intermediate. Subsequently, there was a bifurcation in 
the unfolding pathways. In most cases, the route to the unfolded ([/) involved population 
of two additional intermediates, [GFPAaA^^i] (A/5i represents forced-rupture of N-terminal 
;5-strand) and [GFPAaA;5iA;52/53]. The most striking prediction of the simulations was 
that the minor pathway had only one intermediate [GFPAaA/Jn] besides [GFPAa]. The 
predictions using SOP simulations of GFP were quantitatively validated by single molecule 
experiments [7T] . 

Cotranslational folding: With the determination of the ribosome structures [72l [73] 
there is great interest in the folding of proteins as they are synthesized. Upon synthesis. 
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which occurs at the rate of about 20 amino acids per second in E. Coli.^ the polypeptide 
chain traverses a roughly cylindrical tunnel whose lining changes from the peptidyl transfer 
center (PTC) to the exit that is ~ 10 nm from PTC (Figjs]^). Experiments have shown 
that it is likely that certain regions can accommodate a-helices depending on the sequence, 
which is of particular interest for transmembrane helices that can be directly inserted into 
the membrane by the translocon. Inspired by these experiments, theory and simulations 
were used to show that the extent of helix formation does depend on the sequence [74j , the 
diameter of the tunnel, and potential interactions between the nucleotides and residues that 
line the tunnel and the polypeptide chain. 

More recently, several experiments have probed the possibility of tertiary structure 
formation especially in the vestibule near the exit tunnel, whose volume is large enough for 
tertiary structure formation of the N-terminal region of the protein. CG simulations, which 
use either Ca model [75j or Ca-SCM |E6j and all atom representation of RNA or TIS or four 
site model for RNA, have been used to interrogate coupled-synthesis and folding (Figjs]^). 
Some general results were found in these simulations. (1) Polypeptide synthesis and folding 
are not coupled for single domain proteins, which require the synthesis of complete protein 
for folding to commence. (2) However, cotranslational folding is prevalent in multi-domain 
proteins in which the N-terminus region is likely to fold as it exits the tunnel. In this case 
the in vivo folding pathway is expected to be different than in vitro. (3) Simulations also 
suggest that interaction with the ribosome surface decreases folding pathway diversity and 
results in a more compact transition state structure [76]. 

Towards folding under cellular conditions 

Cellular interior is replete with a host of macromolecules, which can alter all processes 
ranging from transcription to folding. For example, in E. coli ribosome 20.8 nm), poly- 
merases and other protein complexes occupy merely 22 % of total volume and small com- 
plexes and other small complexes constitute about 8 % of the total volume. Thus, unlike in 
vitro experiments where folding is studied in an aqueous solution corresponding to infinite 
dilution conditions, crowding effects have to be taken when describing their behavior in vivo. 
A simple calculation shows that average spacing between cytoplasmic proteins is ^ 4 nm 
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[77] (or p > (50 — 400) mg/ml [78]). Given that the diameter of a typical proteins (^ 300 
aa) is ^ 4 nm, the cell is an extremely crowded place, which severely inhibits conformational 
fluctuations that are easily realized in typical in vitro experiments. 

An approximate mimic of the cellular environment can be realized by adding high con- 
centrations of natural or synthetic macromolecules. Consider the simplest case of a crowding 
agent with radius Rc (FicoU 70 for example) that is inert towards the protein or RNA. The 
volume fraction = pv^ where v{= |7ri?^), the volume of the crowding particle, can be 
altered by Rc even with p fixed. The first crowding simulation used a CaSCM of a /3- 
sheet protein in the presence of spherical crowding particles with < (/^c < 0.25 [32j. The 
CG simulations showed that when only excluded volume interactions dominate stabilities 
of globular proteins relative to (fc = [32j are enhanced (Figji^). The extent of stability 
change, measured using AT^ = T^(99c) — Tj^^i^fc = 0), showed that AT^ ^ (fl^^^ where 
the iy{= 3/5) is the Flory exponent that characterizes the size of the unfolded states of 
proteins. The scaling of AT^((/Pc) with has been confirmed in recent experiments [79j. 
Simulations also showed that the folding rate kF{(fc)^ is also affected when 7^ 0. Rate, 
kpi^c)) increases monotonically till an optimum value, and subsequently decreases (FigjijD). 
Interestingly, identical behavior was observed in the dependence of the relaxation rate of 
phosphoglycerate kinase (PGK) as a function of FicoU concentration. The simulation results 
on crowding-induced effects on the smaller /3-sheet WW domain explains several aspects of 
folding of PGK (FigjJ). 

In an insightful application of simulations it was recently shown that crowding can alter 
catalytic activity of kinase (Figji]^) [79]. As is common in many kinases, PGKs that transfers 
phosphate group from diphosphoglycerate to ADP, has a catalytic site between the N- and 
C- lobes connected by fiexible hinge. To perform kinase activity, PGK must undergo a 
large scale structural movement that reduces the distance between N- and C-lobes. It was 
found that PGK activity is increased over 15-fold in 200 mg/mol {^pc ^ 0.2) FicoU 70. The 
enhancement in activity was attributed to crowding-induced shape change that brings the 
N and C lobes in proximity. 
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Biological nanomachines 

Biological machines are typically multi-subunit constructs that carry out myriads of 
functions by interacting with a range of proteins and RNA. Examples of such machines 
include molecular motors (kinesin, myosin, and dynein), E coli chaperonin GroEL, 
FoFi-ATPase, ribosomes, and helicases. A common theme in the function of these systems 
is they consume energy and in the process undergo a reaction cycle that dictates their 
function. Free energy transduction from chemical energy to mechanical work via a series of 
conformational switches is the hallmark of biological nanomachines [80j. 

Chaperonin GroEL: Most of the proteins in cells fold spontaneously. However, molec- 
ular chaperones have evolved to rescue a small fraction of proteins, which do not reach their 
native states easily and hence are destined to aggregate. In E. coli it is estimated that only 
about (5-10) % of the proteins [81j require assistance from the chaperonin GroEL, which 
has been extensively characterized using experiments and simulations [82] . 

GroEL has two heptameric rings that are stacked back-to-back [83j, with each subunit 
consisting of apical (A), intermediate (I), and equatorial (E) domains. During the reaction 
cycle GroEL (Fig. 5) undergoes a series of a structural (allosteric) transitions upon binding 
of SP, ATP, and the co-chaperonin GroES (Figj5]). In the T state, the hydrophobic patches 
in the A-domain recognize the exposed hydrophobic residues of the misfolded SPs. ATP- 
binding triggers dramatic domain movements in GroEl resulting in the catalytic sites moving 
apart, which in turn imparts a stretching force to partially unfold the captured SP. This step 
is followed by GroES binding, which results in the encapsulation of the SP in the central 
cavity. The extent of structural changes at the molecular level in each subunit of GroEL 
(each ring has ^ 3850 residues) during the reaction cycle (T ^ i? ^ i?" T) was revealed 
only through CG simulations [18j. 

Simulations using the SOP model of the entire heptameric GroEL particle vividly illus- 
trated the conformational changes of GroEL triggered by ATP binding (T R) and ATP 
hydrolysis [R i?"). Multiple simulation trajectories revealed an unprecedented view of 
the key interactions that drive the allosteric transitions jl8]: (i) A domains rotate counter- 
clockwise in the T ^ R transition and clockwise in i? ^ i?" transition, (ii) Global T ^ R 
and R i?" transitions follow two-state kinetics while the formation and kinetics of disrup- 
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tion of residue pairs encompass a broad range of time scales. There is an underlying kinetic 
hierarchy of internal dynamics that govern global transitions, (iii) For both T ^ R and 
R i?" transitions, disruption and formation of salt-bridges are coordinated at multiple 
sites, which mediate the communication between two neighboring subunits and synchronize 
the dynamics of the heptameric ring, (iv) There is a spectacular outside-in movement of 
two helices accompanied by interdomain salt-bridge formation, which are both solvent ex- 
posed in the R state. As a result the microenvironment of the SP, which is predominantly 
hydrophobic in the T state becomes progressively hydrophilic as the reaction cycle proceeds. 

These large scale conformational changes are linked to function. As long as misfolded 
proteins, which typically have exposed hydrophobic regions, are presented to GroEL 
they are captured. In the transitions (T R) and even more dramatically in R ^ i?" 
the structural changes in the GroEL particle results in the interactions between SP and 
GroEL from being favorable in the T state to unfavorable in the i?" state. Changes in the 
microenvironment results in the SP being placed in different part of the folding landscape 
from which it can fold with some probability during the life time of the R and i?" states 
(Fig. 5). If the cycle is iterated multiple times, sufficient yield of the SP can be obtained as 
anticipated by the Iterative Annealing Mechanism (lAM) [841185] . It is amusing to note that 
the mechanism of GroEL function is hauntingly similar to the simulated annealing protocol 
[86] used in the context of NP hard problems. Not surprisingly, nature has stumbled upon 
it apparently millions of years earlier. 

Kinesins: Kinesins are motors that transport cellular organelles along the network of 
cytoskeletal filaments [80l [871 EBJ. Made of two identical motor domains linked by a coiled- 
coil stalk, kinesins exploit the free energy generated from binding and hydrolysis of ATP to 
produce the characteristic hand-over-hand stepping motion. A number of SM experiments 
show that kinesin takes roughly 8 nm step along the polar microtubule (MT) track as it 
strides towards the (+) end consuming one ATP per step. 

Due to the fundamental limitations in experimental resolution, it is difficult to provide 
molecular explanations of many intriguing observations related to kinesin motility such as 
the free energy transduction, out-of-phase coordination of the processes occurring at the 
two motor domains, and the role of kinesin-MT interactions. In order for both heads to 
associate with the MT binding sites, internal tension (8 — 15) pN) exerted through the 
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neck-linker deforms the catalytic site from its native-like configuration [23], thus inhibiting 
the premature binding of ATP to the nucleotide free leading head. The ATP inhibited state 
is maintained as long as the two heads remain bound. The deformed leading head catalytic 
site is restored only after the inorganic phosphate (Pi) is released, which changes the trailing 
head from a strong to a weak binding state. Thus, processivity of kinesin is regulated by 
strain in the leading head which can be linked to the topology of the kinesin-MT complex. 
Simplified molecular simulations combined with theoretical ideas have also shed light on 
the vexing question of whether kinesin takes substeps (Figj6]) [24j. 

Transcription initiation by bacterial RNA polymerase: The synthesis of RNA, 
carried out by DNA-dependent RNA polymerase (RNAP) in a process referred to as tran- 
scription, involves several stages. The highly regulated transcription process in eukaryotes 
is extraordinarily complicated involving a whole zoo of transcription factors which interact 
with polymerase as it reads the codes on the template strand of DNA to make RNA (FigjT]). 
Transcription in bacteria also involves a number of steps. The DNA-dependent RNAP, whose 
sequence, structures, and global functions are universally conserved from bacteria to man, is 
the key enzyme in the transcription of genetic information in all organisms. The three major 
stages in the transcription cycle, which first involves binding of initiation-specific transcrip- 
tion factors to the catalytically competent core of RNAP, to form a holoenzyme are: (i) 
Initiation, during which initiation-specific a factor binds to the catalytically competent are 
RNAP to form the holoenzyme. This step is followed by recognition of the promoter DNA 
to form the closed {R • Pc) complex and subsequent transition to the open [R • P^) structure, 
(ii) Elongation of the transcript by nucleotide addition, (iii) Termination involving cessation 
of transcription and disassembly of the RNAP elongation complex. 

Recently the dynamics of structural transitions that occur during R - Pc ^ R • Po 
transition, which leads to melting of 12 base pairs in the promoter region resulting in the 
formation of transcription bubble (Fig{7|3) [89] were probed using CG simulations [89j. 
To perform these simulations, CG model for the 3,122 residue RNAP-DNA complex (15 
nm long and 11 nm wide) that is identical to those used to describe GroEL and kinesin 
dynamics, was used. For DNA, each strand was represented using a single site located at 
the center of the nucleotide. Transcription bubble forms in three steps (FigjTj^). (i) Melting 
of -10 element on the promoter region, (ii) Scrunching of promoter DNA into RNAP 
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active channel, followed by the formation of bubble; the accommodation of dsDNA into 
the channel involves an internal RNAP dynamics of a transient expansion of key structural 
motifs in the /3 subunit. (iii) Bending of downstream DNA after the unwinding of the 
dsDNA. Simulation results revealed that internal RNAP dynamics resulting in transient 
buildup of strain is needed to fully accommodate dsDNA to gain access to the active site. 
The simulations (for animation see http://www.youtube.com/watch?v=Q6QoyD13TCw) 
also make several testable predictions to probe the relationship between RNAP motion and 
transcription bubble formation. 

Outlook 

Given that biological problems are complex it is inevitable that CG models should play a 
key role in informing experiments. Although not reviewed here there are a number of areas 
such as protein aggregation, membrane structure and dynamics j90j, and lipid-membrane 
interactions where such simulations have already been profitable [91]. Experimental con- 
straints and theory have been the guiding factors in constructing length-scale dependent CG 
models, as a few examples here illustrate. Current methods can be used to provide insights 
into a number of important biological problems. On a few nm length scale, corresponding 
to folding problems, there is a need to study proteins in excess of 200 residues. Modeling 
counter ion eflFects to describe RNA folding can be achieved by integrating theoretical ideas 
from polymer physics and suitable CG models. Although electrostatic interactions have 
been approximately modeled in simulations of biological machines [18j further refinements 
might needed for more accurate simulations [92j . On longer length scales there are a number 
of problems which could profit from CG simulations. Description of motor-driven polymer- 
ization and depolymerization kinetics of microtubule and protein-induced polymerization of 
act in are two examples. 

The demand to develop CG models will continue to grow because there is an appetite 
to understand the workings of a cell. The increasing attention paid to obtain real-time 
measurements on how the workers (enzymes, ribozymes, ribosomes, genomes, lipids, mem- 
branes etc) cooperate to execute the demands on the cell is sure to spur interests in models 
and theories. From a modeling perspective, it is neither possible nor desirable to devise 
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microscopic models when considering events on long length and time scales. In constructing 
whole-cell models, it may be sufficient to model the various workers as quasiparticles, 
which interact with each other through connected networks that are dynamically changing 
depending upon the cell status and external stimuli. Such a viewpoint is already being 
used in systems biology. The lesson from theoretical approaches to problems in condensed 
matter and material science is that phenomena at different length and time scales require 
different levels of description. Such a perspective, which also applies to biological problems, 
will surely spur us on to develop suitable CG models and theories that capture the essence 
of the problem at hand without being encumbered by unnecessary details. 
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Box[TJ Genre of CG models. Formally 
one can think of coarse-graining as a process ^ 
by which effective energy functions are gen- 
erated from microscopic Hamiltonian by in- 
tegrating over irrelevant degrees of freedom. 
The block spin renormalization procedure in 
Ising model shows that as the degrees of 
freedom are thinned multiparticle interac- ^ 
tions are generated. Similar ideas can be 
used to construct effective Hamiltonian by 
insisting that the partition function for the 
CG and the microscopic Hamiltonian be the 
same. However, given the large inaccuracies ^ 
in force fields, intuitive and physical consid- 
eration have proved far more profitable in 
guiding the development of CG models. The 
CG strategy is successful because the char- 
acteristic time scales at each length scale are 
well separated from each other. 

In response to the challenge of describing biological processes that span several orders of 
magnitude in time and length scales a variety of CG models for DNA, RNA and proteins 
have been proposed. Although CG models have been prevalent in the polymer literature 
for over fifty years their use in proteins began in earnest with the pioneering work of Levitt 
and Warshel [llj. The efficacy of off-lattice models for protein folding kinetics was first 
demonstrated by Honeycutt and Thirumalai ^2j. In all the CG models polypeptide chains 
and nucleic acids are represented using a reduced description. The accompanying figure 
shows a few examples, a. Three Interaction Site (TIS) models for a RNA hairpin obtained 
by representing each nucleotide by three sites one each for phosphate, ribose and base. b. 
Ca-SCM for WW domain obtained by replacing each amino acid by two sites one centered on 
the a-carbon and the other at the center of the side chain, c. Water-mediated interactions 
can also be captured using effective potentials [93j. Using these representations and their 
variations a number of types of CG models have been developed. The common unifying 
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aspect of all these models is that the nucleotides and amino acids are represented by only 
a few interaction sites. However, they vary in details especially the number of interaction 
centers per nucleotide or amino acid needed to encode the folded structure. 

A major advantage of CG models is that their conformational space can be exhaustively 
sampled. However, even with simplification accurate results for thermodynamics can 
only be obtained using enhanced sampling methods. Towards this end simulation of CG 
models have used replica exchange methods and multicanonical methods. In addition, low 
friction Langevin dynamics has also been used to efficiently sample conformational space. 
These methods are necessary especially in simulating proteins with complex topology. In 
order to obtain kinetic information for folding or transition times between allosteric states 
typically Brownian dynamics (BD) simulations are performed. In typical BD simulations 
the Brownian time is th ^ ChCl^ /^bTs where Ch is the friction constant, a is the roughly 
the size of a coarse-grained bead, and is the simulation temperature. Estimate of 
these quantities [94] have been used to map simulation times to real times in a number of 
applications. 
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Figure Captions 

Fig{TJ DNA applications, a. Loop formation times between two regions in dsDNA 
separated by s along the contour from simulations using CG model that represents a 
single-pitch of DNA helix as a monomer unit. Lines are theoretical results, b. Extension 
as a function of mechanical force for 97kb A-DNA. Symbols are experimental results and 
the dashed line is the fit using WLC model, c. Model of bacterial chromosomal separation 
from simulations of tightly confined polymer chain. The newly synthesized DNA (blue 
and red) is extruded to the periphery of the unreplicated nucleoid (grey) and the two 
strings of blobs drift apart and segregate due to the excluded-volume interactions and 
conformational entropy, d. Top figure shows scaling law P{s) ^ ^-l os where P{s) is 
the contact probability for a given genomic distance 5, measured by Hi-C, a method that 
probes the three-dimensional architecture of whole genomes by coupling proximity-based 
ligation with massively parallel sequencing. The exponent in the power law decay is distinct 
from s~^'^ for an equilibrated globule (bottom left) whereas s~^'^^ scaling (dashed lines 
from CG simulations in the top figure) is explained using a fractal globule (bottom right), 
a knot-free, polymer conformation, which enables reversible folding and unfolding at any 
genomic locus. Figures a— d were adapted from [42], [34], [48j, and [51j, respectively. 

Figj2| Ribozyme to RNA hairpin folding, a. Left is the secondary structure map 
of Tetrahymena group I intron where the circles show that between 5-6 nucleotides are 
used to represent one interaction center. Simulations of the CG model are used to obtain 
best agreement with time dependent SAXS signals as the ribozyme folds. Representative 
structures that produce best agreement with experiments are shown [57j. b. Refolding 
pathways of a RNA hairpin upon quenching the force from a high to low value (left) and 
obtained from temperature quench (right) using SOP model. Upon force quench folding 
commences from an extended (E) state by forming the turn, which nucleated the hairpin 
formation (left). However, folding occurs by multiple pathways upon temperature quench. 

FigJSj Protein folding, a. Dependence of fraction of molecules in the Native Basin 
of Attraction as function of Guanidinium Chloride concentration for protein L (blue) and 
Cold Shock protein (red) where the symbols are data from experiments, and the lines are 
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results from Molecular Transfer Model simulations, b. Unfolding of GFP, a ^ 250-residue 
11-stranded ;5-barrel protein (left). Forced unfolding obtained from CG simulations on the 
right shows pathway bifurcation. The structures from the simulations at various stages of 
unfolding are also shown, c. The volume of exit tunnel (left) and a helix (right) are shown 
in the ribosome structure. Cotranslational folding of a protein occurs as it is synthesized 
by the ribosome. 

Figj4| Crowding effects on folding, a. Crowding-induced entropic stabilization of 
the folded states of proteins. Restriction of the extended denatured state ensemble because 
of volume occupied by crowding raises its free energy to a greater than the folded state. 
The structures are from Ca-SCM simulations of a three stranded /3-sheet protein, WW 
domain, b. Folding time as a function of the concentration of the crowding agent (black 
lines) for phosphoglycerate kinase (PCK). The red curve shows folding of WW domain as 
a function of (^c- The comparison is meant to illustrate that CC simulations qualitatively 
explain the measurements, c. Structure of PCK in the absence of crowding agent (left) 
and at (/)c = 0.25 on the right. The distance between the N and C terminus lobes has been 
dramatically reduced by crowding. The functional implications are given in the text and in 
[79] . Figures a— c were adapted from [32j and [79j. 

Figj5| Reaction cycle and GroEL function. The hemicycle of CroEL reaction 
cycle, which shows that a misfolded substrate protein (SP) is captured by CroEL {E. Coli. 
chaperonin) in the T state. This step is followed by reversible ATP-driven transition to the 
R state to which the co-chaperonin CroES can bind to form the R complex, which also 
results in the SP being encapsulated in the CroEL cavity. The SP can fold by the KPM. 
Hydrolysis of ATP, which results in R" formation, is followed by an allosteric signal from 
the bottom ring leads to release of ADP, CroES, and SP (folded or not), thus resetting the 
top ring to the T state. 

FigjH} Mechanochemical cycle of the conventional kinesin. The diagram depicts 
the enzymatic cycle of a dimeric kinesin that generates a single 8 nm step on MT track. 
The head-to-head regulation via neck-linker results in the out-of-phase coordination of 
catalytic cycle. T, DP^ D and (j) denote ATP, ADP-Pi, ADP, and nucleotide free state of 
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the catalytic site. The yeUow arrow represents the ordered state of neck-hnker. 

Fig{7| Promoter melting induced by bacterial RNA polymerse. a. Schematics 
of the base pairing between the template and non-template strands of the promoter. 
Nucleotide positions are numbered relative to the transcription start site, +1. DNA 
segments that interact with RNAP, —35 and —10 elements, are shaded red. b. Structural 
models correspond to R • Pc (left) and R • Po (right). Transcription bubble structure is 
on the bottom right (DNA non-template strand (yellow) and template strand (green)), c. 
Sequence of events in transcription bubble formation, melting, scrunching, and bending 
process from top to bottom extracted from simulations. Blue lines show the changes in the 
conformation of the template strand in three major stages in the formation the transcription 
bubble. Yellow circles embedded within the polymerase represent the position of Mg^+. 
Red and green circles mark the positions of the nucleotides, and highlight the processes of 
melting, scrunching, and bending. 
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